On Robust Mahalanobis Distance Issued from Fast Mcd and Mvv
نویسندگان
چکیده
In modern activities such as banking, homeland security, information transportation, telecommunication, etc., people work with large and high dimension data sets. But, the higher the dimension the higher the probability that outliers will be present in the data sets. The ability to detect outliers in high dimension multivariate data sets is a challenging task. In this circumstance, robust estimates of location and scale are needed. One of the primary problems encountered in robust estimation of location and scale is to ensure that the estimators are highly robust and computationally efficient. The most popular and widely used highly robust method to estimate such parameters is the so-called fast minimum covariance determinant (FMCD). Although it satisfies the desirable statistical properties such as high breakdown point, affine-equivariant, and bounded influence function, however, its computational efficiency, which is as important as its effectiveness, is lower when the data sets are of high dimension. It is a direct consequence of the use of Mahalanobis squared distance (MSD), which needs the inversion of covariance matrix, in data ordering process and the use of covariance determinant as the objective function. It is known that covariance matrix inversion and covariance determinant have the same high order of computational complexity, i.e., O(p) where p is the number of variables. In this paper we use vector variance, introduced by Djauhari (2007) as a measure of multivariate dispersion, and then define an alternative objective function to increase the computational efficiency. Through simulation experiments we compare the effectiveness and the computational efficiency of MVV algorithm, introduced by Herwindiati et al. (2007), with those of FMCD algorithm. The two algorithms have the same structures and only differ in their objective functions. If the objective function of FMCD is by minimizing the covariance determinant, which of MVV is by minimizing the vector variance. Simulation experiments will show that MVV is as effective as FMCD. More precisely, the two algorithms give the same results. Furthermore, MVV has higher computational efficiency than that of FMCD.
منابع مشابه
A Robust Estimation of Location and Scatter
Statisticians face increasingly the task of analyzing large and high dimension multivariate data sets. This is due to the advances in computer technology which have facilitated greatly the collection of large data sets and, on the other hand, to the fact that most statistical experiments are multivariate in nature. One of the primary problems encountered in this task is robust estimation of loc...
متن کاملA Fast Algorithm for the Minimum Covariance Determinant Estimator
The minimum covariance determinant (MCD) method of Rousseeuw (1984) is a highly robust estimator of multivariate location and scatter. Its objective is to nd h observations (out of n) whose covariance matrix has the lowest determinant. Until now applications of the MCD were hampered by the computation time of existing algorithms, which were limited to a few hundred objects in a few dimensions. ...
متن کاملThe Distribution of Robust Distances
Mahalanobis-type distances in which the shape matrix is derived from a consistent, high-breakdown robust multivariate location and scale estimator have an asymptotic chisquared distribution as is the case with those derived from the ordinary covariance matrix. For example, Rousseeuw’s minimum covariance determinant (MCD) is a robust estimator with a high breakdown. However, even in quite large ...
متن کاملNonsingular Robust Covariance Estimation in Multivariate Outlier Detection
Rousseeuw’s minimum covariance determinant (MCD) method is a highly robust estimator of multivariate mean and covariance. In practice, the MCD covariance estimator may be singular. However, a nonsingular covariance estimator is required to calculate the Mahalanobis distance. In order to fix this singular problem, we propose an improved version of the MCD estimator, which is a combination of the...
متن کاملApplying the Mahalanobis-Taguchi System to Vehicle Ride
The Mahalanobis Taguchi System is a diagnosis and forecasting method for multivariate data. Mahalanobis distance is a measure based on correlations between the variables and different patterns that can be identified and analyzed with respect to a base or reference group. The Mahalanobis Taguchi System is of interest because of its reported accuracy in forecasting small, correlated data sets. Th...
متن کامل